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Factor  analysis  Is  now  widely  used  to  Identify  and  evaluate  significant 
underlying  factors  from  masses  of  numerical  data  (_1 ) . The  methods  were  developed 
mainly  by  psychologists.  By  1966  more  than  2000  publications  using  factor  analysis 
had  appeared  O) . Since  then,  applications  have  mushroomed  Into  such  fields  as 
psychiatry  {^) , medicine  (_5) , anthropology  and  biology  (^) , chemistry  (_7) . geology  (H) 
and  national  and  International  relations  Including  military  strategy  W . 

Textbooks  on  factor  analysis  abound  (10,11) . Applications  are  facilitated  by 
the  ready  availability  of  computer  programs  for  factor  analysis  In  SPSS  (12) , 

BMD  and  BMDP  (H)  DATA-TEXT  (j^) , P-STAT  (JJi ) , SSP  (j^)  , IMSL  (H) , TSP  (j^) , and 
other  software  packages  for  statistical  analysis. 

Factor  analysis  was  designed  to  find  the  underlying  determiners  (factors) 
that  can  account  for  the  correlations  between  the  different  sorts  (series)  of  data 
measured.  Recently  we  became  concerned  about  whether  or  not  the  standard  factor 
analysis  procedures  do  In  fact  succeed  In  this  goal,  and  If  not,  what  modifications 
are  needed  to  make  them  succeed.  It  Is  difficult  or  Impossible  to  answer  these 
questions  from  examination  of  past  applications  because  the  true  underlying  factors 
are  generally  not  known  In  real  problems.  Thurstone's  tests  (19,20)  were  not 
adequate,  as  we  shall  show  below.  Therefore  we  devised  the  following  synthetic 
problem  where  a correct  set  of  underlying  factors  can  be  distinguished  from  an 
Incorrect  set. 

We  attempted  to  analyze  data  on  seven  properties  of  right  circular  cylinders 
(areas,  masses,  moments  of  Inertia,  etc. ) , all  of  which  are  determined  precisely 
by  only  two  underlying  types  of  factors  (radii  r_  and  heights  h) . To  be  successful, 
a factor  analysis  must  separate  these  factors.  By  analysis  of  these  data.  It  must 
calculate  a number  (factor)  for  each  cylinder  that  Is  a pure  measure  of  cylinder 
radius.  This  may  be  £,  2tT£,  , log  £,  2 log  X X»  must  not  be  rh, 

X/]i  or  any  other  mixture  or  hybrid  of  r_  and  h.  It  must  calculate  a second  factor 
for  each  cylinder  that  Is  a pure  measure  of  cylinder  height.  This  may  be  h,  log  h 
or  any  other  fixed  function  of  h that  Is  unaffected  by  x*  must  not  calculate 

or  Indicate  any  additional  factors.  Success  or  failure  In  this  simple  synthetic 
example  Is  therefore  clearly  defined,  and  can  be  used  to  test  the  standard  procedures 
commonly  employed  and  various  modifications  of  them.  We  shall  show  that  the  standard 
procedures  fall  to  meet  this  challenge,  but  that  success  can  be  attained  by  adding 
a novel  kind  of  transformation  (rotation),  provided  that  there  are  no  missing  data. 

It  has  been  recognized  that  the  principal  problem  with  the  use  of  factor 
analysis  lies  In  discovering  an  appropriate  transformation  (rotation)  capable  of 
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separatlng  the  factors  into  pure  (unhybrldlzed)  types.  Rotation  of  axes  and 
variation  of  the  angle  between  axes  have  been  tried  In  various  combinations, 
but  they  have  not  been  logically  derived  from  the  specific  problem  and  do  not  give 
correct  parameters.  We  shall  show  that  the  transformation  equations  must  be 
derived  from  valid  (true)  subsidiary  Information  related  to  the  problem  at  hand. 
Unfortunately,  no  one  has  derived  transformation  equations  from  such  problem- 
related  subsidiary  conditions  previously.  Instead,  standard  transformations  such 
as  varlmax  rotations  have  generally  been  used.  Since  we  shall  show  that  these 
fall  to  separate  the  factors  In  our  simple  example,  we  believe  that  they  may  have 
failed  to  separate  them  In  all  previous  applications. 

It  has  been  recognized  that  missing  data  are  undesirable.  Gaps  in  the  input 
data  matrix  Introduce  serious  errors  Into  the  correlation  coefficient  matrix,  well 
in  advance  of  the  factor  extraction  and  transformation  procedures.  Unfortunately, 
many  "factor  analyses"  have  been  carried  out  In  Spite  of  missing  data.  SPSS 
provides  three  different  options  for  coping  with  missing  data.  We  shall  show  that 
factors  are  not  separated  correctly  when  data  are  missing,  using  any  of  the  standard 
methods  for  handling  missing  data,  even  with  our  novel  transformation  which  works 
correctly  with  a full  data  set. 

Our  findings  are  presented  under  four  headings:  (1)  selection  of  a problem 
suitable  for  testing  factor  analysis,  one  where  success  or  failure  can  be  recognized 
but  otherwise  similar  to  a real  application;  (2)  the  transformation  needed  after 
the  Initial  extraction  process;  (3)  a demonstration  that  extraction  followed  by 
this  transformation  yields  correct  parameters  when  no  d^a  are  missing;  (4)  a 
demonstration  that  seriously  erroneous  and  misleading  parameters  can  result  If  a 
moderate  fraction  of  the  pertinent  data  are  missing  (e.g. , owing  to  their  never 
having  been  measured,  or  to  lost  records,  or  incomplete  returns,  or  deliberate 

omission  due  to  concern  about  accuracy)  even  If  all  of  the  used  data  are  perfectly 

/• 

accurate.  / 


Selection  of  a Test  Problem 


The  hazards  in  factor  analysis  first  became  apparent  to  us  when  we  attempted 
to  apply  principal  components  (^)  to  a chemical  problem,  to  Identify  and  evaluate 
the  two  solvent  characteristics  most  responsible  for  changes  in  thermodynamic, 
kinetic  and  spectral  data  when  the  aolvent  la  varied.  Missing  data  (never  measured) 
were  extensive  in  this  problem,  comprising  75X  of  the  possible  combinations  of  the 
25  reactions  and  60  solvents  considered,  but  the  data  that  were  available  covered 
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ranges  of  more  than  a power  of  ten  for  each  reaction  and  were  believed  to  be 
Individually  accurate  and  precise  to  about  il5%.  Principal  components  gave 
solvent  parameter  values  In  absurd  rank  orders  using  any  of  the  programed  procedures 
for  handling  missing  data.  Therefore  we  undertook  to  examine  the  method  further, 
using  a similar  but  simpler  problem.  Since  logarithms  of  the  measurements 
constituting  the  data  in  our  chemical  problem  appeared  to  be  linear  functions  of  two 
solvent  factors  (anlon-stablllzing  ability  and  catlon-stabillzlng  ability),  we  chose 
a test  problem  where  logarithms  of  the  data  were  known  to  be  accurately  linear 
functions  of  two  factors,  l.e. , the  cylinder  problem  that  was  solved  correctly, 
without  or  with  missing  data,  by  our  "DOVE"  (dual  obligate  vector  evaluation)  proced- 
ure outlined  in  the  previous  article  (2^). 

Thurstone  attempted  to  Justify  factor  analysis  by  means  of  a cylinder 
problem  more  than  30  years  ago  (19)  , In  one  of  the  few  recorded  tests  (^)  that 
involved  applying  It  to  a problem  with  answers  known  In  advance,  but  neither  his 
cylinder  problem  nor  any  of  the  other  test  problems  had  any  missing  data. 

Thurstone 's  problem  was  based  on  diameters  and  heights  carefully  preselected  to  have 
zero  correlation  In  spite  of  small  sample  size.  Real  data  samples  seldom  have  this 
property.  Nevertheless,  his  selection  of  an  application  to  cylinder  properties  was 
an  Inspired  choice  for  testing  factor  analyses  because  It  embodies  the  most  essential 
features  of  a real  two-mode  problem  In  an  especially  simple  and  easily  understood  form. 


The  Transformation 


Slopes  (a. ,b. ) and  factors  (x.,y,)  obtained  initially  from  principal  component 
—1  —1  — J j 

analysis  with  two  modes  (n“2)  correspond  to  formulation  of  the  data  as 
under  the  conditions  that  the  factors  are  orthogonal,  l.e.  ^ — j'^j 


j for  which  data  exist  (e.g. , for  our  10  cylinders),  and  that  the  slopes  are 
normalized,  l.e.  a^^  + » 1 for  each  (property).  However,  this  orthogonality 

condition  is  undesirable,  and  a condition  of  zero  covariance  or  zero  correlation 
coefficient  between  the  factors  would  also  be  undesirable,  because  none  of  these  Is 
likely  to  be  even  approximately  true  for  a small  sample  (only  10  cases).  Further- 
more, In  many  other  p'^oblems  none  of  these  would  be  a good  approximation  even  for 
thousands  or  even  an  Infinite  number  of  cases.  For  example,  In  our  solvent  effect 
problem  a weak  negative  correlation  between  the  two  factors  Is  expected  and  perfectly 
reasonable.  The  second  condition,  ^ + b^*  * 1,  is  also  generally  untrue.  Therefore 
we  should  replace  these  two  untrue  conditions  by  two  valid  conditions. 
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Our  transforroation  is  carried  out  as  follows.  The  old  a^  and  values 
from  factor  analysis  are  unstandardized  to  make  them  comparable  with  the 
observed  data,  » by  multiplication  of  each  by  the  standard  deviation 
of  its  reaction  (1)  from  the  mean  of  for  that  i.  Corresponding 


and  values  are  calculated  by  least  squares  from  the  original  data  and 

the  newly  unstandardized  a^^  and  Values  of  ic^  are  calculated  by  simple  least 

squares  to  fit  the  observed  data  using  + b^y^  + c ^ . Finally,  new 

parameters  meeting  the  new  conditions  are  calculated  by  the  transformation  equations 
(10-15)  using  Jt  values  derived  previously  {2,  example  using  eq.  5).  This  trans- 
formation of  the  factors  and  slopes  to  meet  the  new  six  subsidiary  conditions  Is 
necessarily  considerably  more  complicated  than  the  usual  "rotations"  of  factor 
analysis  which  only  rotate  the  axes  in  the  plane  on  which  the  slopes  are  displayed. 

To  fit  an  equation  of  the  form  » — 1— j ^ —1  experimental  data 

uniquely,  six  Independent  subsidiary  conditions  must  be  specified  (2,  phase  2). 

We  choose  Xj”^*  —3*^  four  relatively  unimportant  ones  (which 

define  reference  points  and  unit  sizes).  For  the  two  critical  statements,  we  again 
choose  and  a^"£j^/2  for  logical  reasons  explained  previously  (2,  example). 


Factor  Analysis  with  No  Missing  Data 

Table  1 and  Figure  1 show  selected  parameters  after  various  procedures. 

We  have  plotted  slopes  here  rather  than  factors  because  slopes  (factor  loadings)  are 
more  commonly  reported  by  users  of  factor  analysis  than  the  factors  (factor  scores) 
themselves.  From  a complete  data  set,  correct  slopes  are  obtained  after  our  trans- 
formation (column  4),  although  the  values  before  transformation  (column  2)  or  after 
standard  rotations  seem  quite  different.  Values  In  parentheses  are  those  assigned 
by  the  subsidiary  conditions.  Use  of  Instead  of  a^'aj^/2  as  the  sixth  condition 

yields  identical  values  for  all  the  parameters.  Introduction  of  small  (^^2%)  random 
errors  into  the  70  data  causes  only  small  ('^'2%)  errors  In  the  factors  and  slopes. 

Since  most  of  the  thousands  of  publications  applying  factor  analysis  have 
certainly  not  used  our  complicated  transformation  or  its  equivalent,  but  only  varlmax 
or  other  similar  kinds  of  rotations  contained  In  statistical  analysis  computer  packages 
(12-18),  it  Is  Instructive  to  note  what  kind  of  results  such  standard  factor  analyses 
give. 

Assumptions  built  into  the  principal  components  direct  solution  cause  the 
distribution  of  data  In  space  to  determine  the  order  In  which  modes  are  extracted  and 
force  the  factors  to  be  orthogonal  and  the  sum  of  squared  slopes  to  be  unity,  although 
these  conditions  may  not  correspond  at  all  with  reality.  Other  extraction  or  direct 
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solutlon  methods  assume  other  conditions,  similarly  arbitrary  or  untrue.  The 
very  popular  varimax  rotation  technique  then  "simplifies''  the  distribution  of  slopes 
so  as  to  concentrate  each  slope  on  a few  and  in  one  mode,  enlarging  slopes  for 

those  ^'s  and  reducing  them  for  others  in  that  mode,  again  regardless  of  whether  this 
makes  physical  sense  or  not.  Other  rotation  techniques  enhance  this  compartmentalized 
maldistribution  of  slopes  even  more. 

Columns  2-4  show  that  100%  of  the  variance  is  explained  (86.2%  by  the  first 
mode,  13.8%  by  the  second).  Thus,  principal  components  calculates  correctly  that 
there  are  two  modes.  However,  this  has  no  bearing  on  the  correctness  of  the  unrotated 
or  rotated  slopes. 

Column  3 shows  the  result  of  varimax  rotation,  because  this  is  the  rotation 
method  used  in  most  published  applications  (^) . The  second-mode  slopes  are 

positive  for  JL“2,  4 and  6,  as  they  must  be  if  they  are  to  make  any  physical  sense, 
because  the  values  of  these  properties  do  Increase  with  increasing  cylinder  height. 
However,  they  certainly  appear  wrong  for  ^=1  and  7,  The  sensitivity  of  the  first 
property,  logarithm  of  the  total  area  of  the  flat  faces,  to  the  second  factor, 
logarithm  of  cylinder  height,  is  deduced  to  be  also  positive  and  relatively  large 
(0.56),  although  we  happen  to  know  that  it  is  in  fact  exactly  zero.  The  sensitivity 
t of  the  last  property,  the  log  of  electrical  resistance  between  the  ends,  to  log  ]i 

is  deduced  to  be  even  larger,  near-unity,  but  negative  (-0.82).  Accordingly,  electrical 
resistance  of  wires  might  be  expected  to  decrease  with  increasing  length,  with  a 
nearly  reciprocal  dependence.  This  would  be  a real  boon  for  long-range  power  trans- 
mission and  could  lead  to  some  interesting  science  fiction,  but  electrical  engineers 
should  not  be  misled  by  these  results.  Neither  should  those  who  would  save  energy 
be  misled,  by  a thermal  resistance  analogy,  into  using  close-packed  lateral  arrays 
of  wide  but  very  short  cylinders  (discs)  for  space-saving  thermal  roof  or  wall 
insulation  in  building  construction. 

— Experimental  verification  of  hypotheses  is  sufficiently  practicable  in 

physics  and  engineering  that  there  has  been  little  desire  for  and  use  of  factor  analysis 
in  those  fields.  The  greatest  dangers  are  therefore  in  psychology,  psychiatry, 
medicine,  and  political  science,  where  experimental  design  is  more  difficult  and 
where  so  many  conclusions  have  accordingly  been  drawn  from  just  such  standard  factor 
analyses  with  conventional  rotations.  It  seems  distinctly  possible  that  progress 
In  chose  fields  has  bean  hampered  rather  chan  helped  In  Che  past  by  such  numerology, 
because  such  misleading  results  are  obtained  even  when  no  data  are  missing.  Such 
answers  can  be  worse  than  no  answers  at  all.  Our  test  might  be  criticized  because 
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of  the  small  number  of  types  of  factors  In  our  example  (ri-2,  only  two  modes  of  Inter- 
action) , whereas  the  problems  to  which  factor  analysis  are  applied  usually  have  n 
equal  to  3 or  more.  However,  the  untrue  conditions  corresponding  to  these  artificial 
manipulations  of  the  slopes,  which  are  so  obviously  invalid  in  our  example,  are 
still  assumed  when  there  are  more  factors,  although  the  absurdity  of  the  results  Is 
then  obscured  by  their  greater  complexity. 

To  obtain  meaningful  slopes  (column  4)  from  factor  analysis  we  have  had  to 
resort  to  our  "transformation",  instead  of  any  conventional  kind  of  rotation,  in 
order  to  replace  the  untrue  conditions  by  true  ones,  even  when  there  are  no  missing 
data.  Unfortunately,  in  other  studies  where  factor  analyses  have  been  used,  untrue 
conditions  have  generally  not  been  replaced  by  a subsequent  transformation  in- 
corporating true  subsidiary  conditions.  Results  of  such  studies  are  therefore  un- 
reliable except  under  special  circumstances  (22) . 

Factor  Analysis  with  20  Missing  Data 

Parameters  deduced  from  50  (accurate)  data  Instead  of  70,  also  with  our 
transformation  using  as  the  sixth  subsidiary  condition,  are  shown  in 

columns  6-10  of  Table  I.  The  20  data  that  are  missing  (randomly  deleted)  are  Indicated 
in  Table  2 (2^).  The  available  factor  analysis  programs  such  as  those  in  SPSS  (12) 
provide  options  for  the  handling  of  missing  data:  llstwlse  deletion,  replacement 
by  zeros,  or  pairwise  deletion.  Since  llstwise  deletion  eliminated  all  ' except 
j^*3,  and  replacement  by  zeros  gave  worse  results  than  pairwise  deletion,  columns  5-6 
were  obtained  via  the  third  option.  The  correlation  coefficient  between  the  50 
observed  and  calculated  data  is  now  only  0.833.  The  rank  order  for  b^  Is  now 
5<4<1<6<2<3<7  instead  of  the  true  order,  5<1<2=3»4“5“6“7,  putting  i»4  between  5 and  1 
instead  of  above  them.  The  diamonds  (O)  shown  for  several  cylinders  in  Fig.  1 are 
not  acceptably  close  to  the  circles.  The  calculated  order  of  increasing  log  h(“j(j) 
is  7<3<10<4<9<1<2<6<8<5  Instead  of  the  true  order,  7<9<3<10<1<2<8<4<5<6.  With 

as  the  sixth  subsidiary  condition,  the  calculated  order  of  Increasing  log  h 
is  still  different,  namely  7<4<10<2<3<9<6<!<8<5.  Such  results  based  on  accurate  but 
Incomplete  data  could  be  dangerously  misleading  in  real  problems,  and  therefore 
worse  than  useless. 

Although  SPSS  does  not  provide  convenient  facilities  for  substituting  the  mean 
of  data  for  an  ^ in  place  of  missing  data  for  that  use  of  means  is  claimed  by 
many  authors  to  be  preferable  to  pairwise  deletion.  Column  7 shows  the  result  (for 
a^-aj^/2).  The  correlation  coefficient  between  observed  and  calculated  data  for  the 


50  Is  now  0.813,  not  much  different  with  this  example  than  pairwise  deletion.  The 
a^  parameters  differ  by  more  than  50%  foi  j_»2,  4 and  6,  although  these  should  all  be 
Identical.  A partial  rank  order  for  log  Ih  Is  1<3<10<4<2<6<8<5  vs.  the  true  order, 
3<10<1<2<8<4<5<6.  The  percent  of  the  variance  explained  by  Increasing  numbers  of 
modes  Is  68.5  (1),  82.9  (2),  94.2  (3),  97.8  (4),  99.4  (5),  99.9  (6)  and  100.0  (7). 
which  would  mislead  us  Into  believing  that  at  least  three  modes  are  Involved  and 
should  be  considered.  If  we  did  not  know  that  there  are  only  two. 

In  a relatively  expensive  but  still  futile  attempt  to  find  a practical  way 
to  obtain  correct  parameters  from  these  50  accurate  data  by  factor  analysis,  we  resorted 
to  iteration  of  the  combination  of  principal  components  extraction  plus  the  complete 
transformation  described  above.  Missing  data  for  any  were  replaced  by  the  mean  of 
data  for  that  1^  before  extraction  In  the  first  cycle,  but  by  data  predicted  from  the 
fastest  parameters  (after  transformation)  In  each  subsequent  cycle.  This  Iterative 
procedure  has  serious  disadvantages  compared  with  DOVE.  First,  It  expands  each 
multiple  regression  to  Involve  the  lastest  estimates  of  all  the  missing  data  as  well 
as  the  measured  data.  Increasing  the  number  of  calculations  Involved  In  the  time- 
consuming  summations  by  40%  In  the  present  example,  and  fourfold  In  our  solvent  effect 
problem  where  75%  of  the  data  are  missing.  Second,  although  most  of  the  parameters 
appear  to  be  converging.  It  still  gives  a serious  number  number  of  bad  parameter 
values  after  five  Iterations  (after  10  job  steps  and  computer  charges  10  times  those 
required  to  obtain  correct  data  and  parameters  to  more  than  6 decimal  places  by  DOVE) . 

At  this  point,  the  % variance  explained  has  leveled  off,  but  the  correlation 
coefficient  has  begun  to  drop,  as  shown  In  columns  7-10  of  Table  1.  Lest  anyone 
believe  that  these  parameters  are  now  nevertheless  good  enough  because  98.7%  of  the 
variance  Is  explained  , we  should  note  that  the  correlation  coefficient 
Is  only  0.917  and  that  the  fit  to  the  observed  data  Is  becoming 

worse,  that  cylinder  1 Is  still  deduced  to  be  shorter  than  cylinder  10  (log  h^  < log  IIj^q) 
whereas  It  is  actually  taller,  the  sensitivities  to  log  h appear  to  vary  more  than 
50%  for  properties  such  as  2,  4,  6 and  7,  for  which  they  are  truly  identical,  and 
those  for  4,  5 and  6 are  getting  worse.  Doutless  there  would  be  other  Inversions  In 
rank  orders  If  the  distribution  of  the  missing  data  were  more  biased  (more  associated 
with  particular  ^'s  and  j^'s),  as  they  often  are  In  real  problems,  rather  than  being 
random  and  relatively  uniform  as  In  this  example.  The  squares  (D  ) In  Fig.  1 show 
that  the  ^2  parameters  for  ^“4  and  6 are  bad  and  currently  becoming  worse.  It  would 
be  In'  resting  to  follow  this  progression  to  Its  limit,  where  all  values  might  possibly 
end  up  correct,  but  they  are  changing  too  slowly  to  make  this  practicable. 


From  these  and  other  experiences  with  iterative  factor  analyses,  we  have 
found  that  It  is  critically  Important  not  to  rely  on  convergence  or  constancy 
of  any  missing  datum,  parameter,  % variance,  or  correlation  coefficient  as  a 
criterion  of  correctness,  but  to  test  any  proposed  program  by  a problem  like  the 
present  example  that  has  Its  answers  known  In  advance. 

We  have  not  found  any  published  literature  documenting  a claim  that  correct 
results  have  ever  been  obtained  by  an  application  of  principal  components  or  any 
other  correlation  coefficient-based  factor  analysis  to  data  sets  with  missing  data. 

We  conclude  that  although  DOVE  Is  able  to  solve  such  problems  by  simply  omitting 
missing  data,  standard  factor  analyses  can  neither  omit  them  nor  obtain  satisfactory 
estimates  of  the  missing  data  or  parameters  by  any  efficient  successive  approximation 
process . 

How  to  Solve  Problems  with  Missing  Data 

Correct  parameters  cannot  be  obtained  by  principal  components  or  any  other 
kind  of  factor  analysis  based  on  correlation  coefficients  when  data  are  missing. 

Usually  the  best  way  to  obtain  correct  parameters  Is  to  measure  the  missing  data  to 
obtain  a full  data  set,  or  narrow  the  scope  of  the  study  to  a full  subset  having  no 
missing  data  and  then  use  principal  components  analysis  followed  by  a valid  trans- 
formation. If  neither  of  these  is  practicable,  we  recommend  use  of  the  iterative 
DOVE  procedure  illustrated  in  the  previous  article,  which  works  correctly  in  spite 
of  missing  data. 

Reproduction  of  the  data  Is  not  a sufficient  test  that  a procedure  is  correct, 
because  an  Infinite  number  of  sets  of  answers  (parameters)  are  consistent  with  the 
data.  It  Is  Indispenslble  to  test  any  new  or  untested  procedure  on  a problem  having  the 
same  number  of  modes  and  with  answers  known  In  advance,  because  errors  In  assumptions 
or  logic  are  almost  certain  to  escape  detection  If  only  real  problems  with  unknown 
answers  are  analyzed. 

Summary 

Principal  components  factor  analysis  Is  tested  for  Its  reliability,  using 
a problem  with  known  answers.  Even  when  test  data  are  complete  (e.g. . 70  data  on 
7 properties,  dependent  on  both  radii  and  heights  of  10  cylinders),  such  analyses 
followed  by  varlmax  or  other  standard  rotations  give  Incorrect  rank  orders  for  factors 
(factor  scores  for  radius  and  height  for  each  of  the  10  cylinders)  and  sensitivities 


(factor  loadings  for  each  of  the  7 properties).  When  no  data  are  missing,  a 
transformation  Incorporating  valid  subsidiary  conditions  can  be  used  instead  of 
such  rotations  to  obtain  correct  factors  and  sensitivities.  However,  when  a 
moderate  number  (e . g . , 20,  or  29%)  of  the  possible  data  are  missing  (randomly 

deleted),  factors  and  sensitivities  can  have  wrong  rank  orders  and  therefore  be 
misleading  even  with  this  transformation.  When  data  are  missing,  standard  factor 
analysis  Is  evidently  unreliable  and  should  be  replaced  by  another  method,  such  as 
that  In  the  preceding  article. 
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Table  1.  Second-mode  slopes,  b , from  principal  components 
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Legend  for  Figure  1 

Figure  1.  A plot  of  correct  relative  sensitivity  to  log  height  vs. 
relative  sensitivity  to  log  radius  for  7 cylinder  properties,  calculated  from 
70  data  by  principal  components  followed  by  a valid  transformation  ^ . 

Also  shown  are  the  most  seriously  Incorrect  values  calculated  similarly  except 
from  50  true  data  (with  20  data  or  29%  of  the  possible  data  missing)  via 
pairwise  deletion  o , replacement  by  means  /\  , or  five  subsequent  Iterative 
replacements  □ . Other  calculated  points  falling  closer  to  the  correct 

values  have  been  omitted. 


